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ABSTRACT: This and a companion paper propose techniques for constructing parametric mathematical mod- 

els describing key features of the distribution of an output variable given input-output data. By contrast to stan- 
dard models, which yield a single output value at each value of the input, Random Predictors Models (RPMs) 
yield a random variable at each value of the input. Optimization-based strategies for calculating RPMs having a 
polynomial dependency on the input and a linear dependency on the parameters are proposed. These formula- 
tions yield RPMs having various levels of fidelity in which the mean and the variance of the model’s parameters, 
thus of the predicted output, are prescribed. As such they encompass all RPMs conforming to these prescrip- 
tions. The RPMs are optimal in the sense that they yield the tightest predictions for which all (or, depending on 
the formulation, most) of the observations are less than a fixed number of standard deviations from the mean 
prediction. When the data satisfies mild stochastic assumptions, and the optimization problem(s) used to calcu- 
late the RPM is convex (or, when its solution coincides with the solution to an auxiliary convex problem), the 
model’s reliability, which is the probability that a future observation would be within the predicted ranges, can 
be bounded tightly and rigorously. 


1 INTRODUCTION 

Metamodeling refers to the process of creating a 
mathematical representation of a phenomenon based 
on input-output data. These models can have a para- 
metric (e.g., polynomial response surfaces, smooth- 
ing spline models, polynomial chaos expansions) or 
non-parametric structure (e.g., Kriging/Gaussian pro- 
cess). In the former case the analyst first prescribes 
the model structure and then determines the value 
of the model parameters such that a measure of the 
discrepancy between observations and predictions is 
minimized. This step is commonly referred to as 
model calibration or regression. Model-form uncer- 
tainty, measurement noise, and numerical error often 
inhibit confidently prescribing a fixed constant value 
for such parameters. Consequently, a family of pa- 
rameter values is prescribed such that the predictions 
resulting from evaluating the computational model at 
any family member accurately represents the obser- 
vations. 

Several model calibration techniques are available 
in the literature. Many of them assume the structure 

y = M(x,p) + 77 , (1) 

where y E M" 1 ' is the output, M is the computational 
model, x E is the input, p E R" 7 ' is the parame- 


ter, and 77 G R is a random variation caused by noise 
and measurement error. Many model calibration tech- 
niques are based on this structure, the assumption of 
p being a fixed but unknown constant (i.e., the uncer- 
tainty inp is epistemic), and the assumption of 77 being 
independent and identically distributed (IID) follow- 
ing a normal distribution with zero mean and a fixed 
variance. A typical regression problem consists of es- 
timating the value of p in M given the set of obser- 
vations (xi, yi), i = 1, . . . , N, where N > n p . This is 
often carried out by searching for the parameter real- 
ization that minimizes the sum of squared residuals, 
p LS . The precision of this estimate, which prescribes 
how much it can deviate from its “true value” within 
an epistemic framework, is often evaluated using con- 
fidence intervals (Seber and Wild 2003). The calcula- 
tion of confidence intervals, prediction intervals (i.e., 
intervals where future observations are expected to 
fall) and credible intervals (Kennedy and O’ Hagan 
2001 ) require having a probabilistic description of p. 
This description often requires (i) knowing/assuming 
a distribution for the prediction error, (ii) M and 77 
taking particular forms (e.g., M depends linearly on p 
and the noise 77 is additive) and/or (iii) M being accu- 
rately represented by a linear approximation in p. As 
such, the suitability of the resulting intervals depends 
tightly on the validity of such assumptions. 

A common approach to model calibration is 



Bayesian inference. In Bayesian inference the ob- 
jective is to describe the model’s parameters as a 
vector of possibly dependent random variables using 
Bayes’ rule (Kennedy and O’ Hagan 2001). The re- 
sulting vector, called the posterior, depends on an as- 
sumed prior random vector, and the likelihood func- 
tion; which in turn depends on the observations, and 
on the structure of M. Whereas this approach does 
not make any limiting assumptions on the manner in 
which M depends on p, nor on the structure of the 
resulting posterior; it requires that the calibrated vari- 
ables in p be epistemic. This vector might be com- 
prised of physical epistemic uncertainties and hyper- 
parameters of aleatory variables 1 . Note that the con- 
sideration of aleatory uncertainties requires assuming 
a structure for them, so they can be parameterized in 
terms of non-physical epistemic variables. The pres- 
ence of aleatory and model-form uncertainty yields 
uncertainty characterizations that fail to describe the 
prediction error. As more data is available, the cali- 
brated p approaches a deterministic quantity, and the 
model prediction y converges to a single function of 
the input. The offset between this function and the 
data is not captured by the model. This deficiency can 
be mitigated by adding a fictitious discrepancy term to 
M (Kennedy and O’Hagan 2001). This term, which 
can have a fixed epistemic or a fixed aleatory struc- 
ture, is calibrated as if it were part of M . In spite of its 
high computational demands, and of the potentially 
high sensitivity of the posterior to the assumed prior, 
this method is commonly regarded as a benchmark. 

In this paper, we do not use an error term such as 77, 
nor do we make prior assumptions about a distribution 
of p. What is here called a Random Predictor Model 
(RPM) has the general form y = M(x.p), where p is 
a random vector, so the output, y, is a random process 
parameterized by x. We do not fully specify the dis- 
tribution of p. Instead, we only seek to find a mean 
value, a variance, and, in some cases, a support set 
for p. These will be determined by solving optimiza- 
tion problems according to the input-output data and 
a scalar parameter chosen by the analyst. The role of 
this parameter is to limit the largest number of stan- 
dard deviations that can separate the data points from 
the mean prediction. The resulting description of p is 
chosen to be as tight as possible while satisfying this 
restriction. We further provide means of identifying 
outliers in the data set so that eliminating them from 
the modeling process can result in predictions hav- 
ing a narrower range at the expense of a reduction in 
the model’s reliability. As compared to previous work 
on interval predictor models (Crespo et al. 2014), the 
main contribution of this article is the consideration 
of random descriptions of p, thus of y, having an ar- 

1 For instance, if q contains the physical parameters of the 
model M, where <j\ is epistemic and <{2 is aleatory having a nor- 
mal distribution with mean y and standard deviation <7, the vec- 
tor p = [qi,y,cr\ T contains three epistemic variables, one phys- 
ical and two non-physical. 


bitrary structure, e.g., the results apply to p having a 
Normal, Beta or any other distribution. 

As in the Bayesian inference approach, the formu- 
lations proposed provide a crisp description of the un- 
certainty in the value of the model’s parameters. In 
contrast to the Bayesian approach however, the meth- 
ods in this paper do not require any prior descrip- 
tion of the uncertainty in p, and the resulting mod- 
els yield analytical characterizations for both the pre- 
dicted output and the model’s reliability. This paper 
focuses on (i) computational models that depend lin- 
early on the parameters and polynomially on the state, 
i.e., y = p T p(x) where ip(x) is a vector of monomials 
in the components of x, and (ii) uncertainty sets for p 
that are hyper-rectangular. The advantage of these sets 
over all other sets (such as the ellipsoidal sets used in 
(Campi et al. 2009)), is that each component of p can 
be selected arbitrarily in its interval independently of 
the choices made for any of the other parameters. As 
such, parameter interdependencies are avoided. This 
independence enables the calculation of RPMs whose 
parameters are independent random variables. These 
properties enable an analytical description of the pre- 
diction and a formal quantification of its reliability. 
Extensions to models having other dependencies on 
x have been made, but this paper will only focus on 
polynomials. 

This paper prescribes formulations for two types of 
RPMs. Whereas Type-1 RPMs use the entire data set, 
Type-2 RPMs neglect a fixed percentage of the obser- 
vations. Such observations, which are identified while 
the RPM is calculated, are regarded as outliers. The 
companion paper (Crespo et al. 2015) focuses on two 
other types of RPMs. 

2 PROBLEM STATEMENT 

A system is postulated to act on a vector of inputs to 
produce an output. The output can depend on the state 
variables and on some other influences, causing, for 
instance, intrinsic variability. Let X C M na: be a set of 
input variables, and Y C M. ny be a set of outputs which 
might result from the system acting on elements of X. 
In the following, the focus will be on the single-output 
(riy = 1) multi-input (n x > 1) case. 

It is desired to build a model of the Data Gener- 
ating Mechanism (DGM) which will predict the out- 
put corresponding to unobserved realizations of the 
input. The presence of intrinsic variability and un- 
certainty (e.g., the case in which some of the states 
that prescribe the measured output are unmeasurable 
or unknown to the analyst) makes it unreasonable to 
build a mathematical model that predicts a single out- 
put for a fixed input. Instead, an Interval Predictor 
Model (IPM) will predict an interval valued function 
into which the output from an unobserved input is ex- 
pected to fall, while an RPM will predict a random 
process matching key features of the data. Engineer- 
ing judgment is used to pick a collection of mono- 



mials in the state variables, (p(x), to use as basis func- 
tions. Data points Zi— (xi, yi) for i — 1 , . . . , N are ob- 
tained from observations of the system. Instead of the 
standard practice of fitting all of the data as closely 
as possible with a single vector p of parameters, the 
thrust in this work is to restrict as much as possible a 
set in M" 7 ' from which p is chosen while, at the same 
time, having the property that each data point (except, 
possibly, for a few outliers neglected purposely by the 
analyst) can be fit exactly by at least one element in 
such a set. One restriction to be considered is for p 
to belong to a set P. For a fixed value of the state x, 
the propagation of P through M yields an interval of 
output values. Thus, these models are called Interval 
Predictor Models. The objective here is to choose P 
to make the corresponding y intervals as small as pos- 
sible and still allow each data point (x l , y,) to be mod- 
eled as ip = p T ip(xi) for some p £ P. The other form 
of restriction considered is to describe p as a random 
vector. For a fixed value of the state x, the propaga- 
tion of this vector through M yields a random vari- 
able R y (x) for the outcome y at x. Various properties 
of R y (x), such as mean, variance, and support set, are 
determined by those of p. The thrust here is to choose 
a random vector that leads a prediction matching key 
features of the data. 

In this setting the two main problems of interest 
can be stated as follows. Let z = {^} = {(a:*, ?/*)}, for 
i — 1 , . . . , N, be a sequence of observations. First, we 
want to find an empirical model that, when evaluated 
at a new value xn+l of the state, returns an informa- 
tive prediction of the unobserved output jjn+i- An in- 
formative prediction can be interpreted as a prediction 
that is consistent with salient features of the data com- 
prising z. These features, which are cast by the analyst 
as design requirements (for example, we might want 
all observed outcomes to be less than 2-standard devi- 
ations from the mean prediction), are cast as inequal- 
ity constraints in the optimization problems used to 
calculate the model. Second, we want to quantify the 
probability that i)n+\ be compliant with such require- 
ments (in the previous example, we want to evaluate 
the probability that y ^+ 1 be less than 2-standard de- 
viations away from the mean prediction). 


3 INTERVAL PREDICTOR MODELS 

This section introduces basic concepts from IPMs that 
are essential for the construction of RPMs. Additional 
information on IPMs and examples are available in 
(Crespo, Kenny, & Giesy 2014). An IPM is simply a 
mapping that assigns an output interval for each value 
of the input. In the context of this paper, an IPM as- 
signs to each instance vector x £ X a corresponding 
outcome interval in Y. That is, an IPM is a set- valued 
map 

( 2 ) 


where a; is a state vector, and I y (x) is the prediction 
interval. Let M be any functional acting on a vector 
x of state variables and a vector p of parameters to 
produce an output y, i.e., y = M(x,p). A parametric 
IPM is obtained by associating to each x £ X the set 
of outputs y corresponding to all values of p in P: 

I y (x,P) = {y = M (x,p), p E P}. (3) 

I y (x,P) will be an interval as long as M(x,p ) is a 
continuous function of x and p, and P is a connected 
set. All instances of M and P considered in this paper 
satisfy these restrictions. Attention will be limited to 
the IPM given by 

Iy(x,P) = {y =p T p(x), p £ P}. (4) 

where <p(x) is a vector of monomials, and 

P = {p : p < p < p}. (5) 

The analyst is free to choose which monomials are 
relevant to the particular application. A general repre- 
sentation of a multivariate polynomial basis is 

<p(x) = [l,x t2 ,x 13 , . . . , at*"] 1 ” , (6) 

where x = x , , . . . , x Ux ] is the state, and the vector 
ij = [ij t i, . . . , ij : n x \, with ij ^ i k for j ^ k has the ex- 
ponents of the monomials 2 . 

The limits of the output of the IPM prescribed by 
(4-6) can be explicitly computed as 

I y (x,p,p) = [y(x,p,p), y(x,p,p)], (7) 

where 

y(x,P,p) = y(x) t (^Y=^j -<^(M) T (^~^j (8) 

y{x,P,p) = p(x) t +<^(M) T ( 9 > 

Therefore, the envelopes of the interval valued func- 
tion I y , are linear functions of p and p, and piecewise 
polynomial functions of the input.The spread of I y , 
which is the separation between its limits, is 

Sy(X,P,p) = (f(\x\) T (p-p). (10) 

Note that the spread depends on the size of the un- 
certainty box P, but is independent of its geometric 
center. 

Commonly, the Least Squares (LS) prediction, y = 
pj s p(x), where p LS is given by 

Pls = (A t A) 1 A T [yi,...,y N ] T , (11) 


2 The inclusion of 1 in p(x) guarantees that every (x, y) pair 
will be interpolated using some p even if x = 0. 


/ : x — > I y (x) C Y, 



for Aij = (pj(xi), for i = 1 .,N and j = 1, ... , n p 

is used to approximate the DGM. p LS minimizes the 
sum of the squares of the predicted errors. 

3.1 Type-1 IP Ms 

A Type-1 IPM is given by Equations (4-6) where P 
is the solution to the following Optimization Problem 
(OP). 

Optimization Problem 1. The limits of P are given by 
(p,P) = argmin {E x [5 y (x,p b ,p a )] : p a < p b , 

Pb,Pa 

y(xi,Pb,p * ) <yi< y(xi,p b ,p & )} , (12) 

where E x [-\ is the expected value operator with re- 
spect to the input x, and (xi, yf) for i — 1, . . . , N are 
the observations in z. 

Therefore, a Type-1 IPM yields a P that minimizes 
the expected interval spread such that all the observed 
outputs are within I y (x). When a: is a random vec- 
tor of known distribution, the cost function in (12) 
can be calculated analytically. Otherwise, the sample 
mean based on the data can be used to approximate 
it. The resulting IPM, which is calculated by solving 
the convex optimization problem in (12), admits a rig- 
orous reliability assessment (see Section 5). This as- 
sessment quantifies the probability that a future ob- 
servation will fall within I y (x). 

4 RANDOM PREDICTOR MODELS 

A RPM is a mapping that assigns to each input vector 
x E X a corresponding random variable in the output 
space Y. That is, an RPM is a random variable-valued 
map 

R : x — > R y (x ) C Y, (13) 

where x is the input, and R y (x) is a random process 
whose support lies in Y. A parametric RPM is ob- 
tained by associating to each x E X the set of outputs 
y corresponding to all values of p described by a ran- 
dom vector with joint Cumulative Distribution Func- 
tion (CDF) Fp(p) having the support set P. As before, 
attention will be limited to the case where the output 
is a linear function of the parameter p, and a polyno- 
mial function of x. This leads to 

Ry(x) = {y = p T <p(x), p : F p (p), p E P}. (14) 

Denote by p E M np , v E M np , and c E M n p( n p -1 )/ 2 the 

mean, variance and correlation of the random vector. 
The variance and correlation fully prescribe the co- 
variance matrix C(is,c) E M npXnp . It can be shown 
that any random vector with a support set P as in (5) 


must satisfy the consistency equations 3 


P < P < P, (15) 

0 <v < (p~p) © (p~p), (16) 

-1 < c < 1, (17) 

C(u,c)h 0. (18) 


The operator symbol in (16) denotes the component- 
wise product of vectors, and the symbol in (18) de- 
notes positive semidefiniteness. 

The random process R y {x ) is fully prescribed by 
the CDF of p. Naturally, statistics of the output y, 
such as the mean p y {x) = E p [y(x,p)\, the variance 
Vy{x) = E p [(y(x,p) — p y (x)) 2 ], and the range I y (x) = 
[mm p y(x,p),max p y(x,p)], vary with x. In particu- 
lar, the mean prediction is p y (x,p ) = p T ip(x), the 
output’s variance is 

u y (x,u,c) = p(x) T C(u,c)p(x), (19) 

and the output’s range is the interval value function 
(7). When the components of p are uncorrelated, (19) 
reduces to 4 

v y (x, u) = is T tp 2 (x). (20) 

A few metrics for characterizing R y (x) are intro- 
duced next. The u-surface, which connects all the out- 
puts y that are r standard deviations from the mean 
prediction, is defined by 

l{x,p,T, u,c) = p T <p(x) -\-T\Jv y (x,iy,c). (21) 

The cr-volume, defined as 

I a {x,p,T,v) = [l(x,p, -T,u,c),l(x,p,T,u,c )\ , (22) 

contains all the outputs y that are no more than r stan- 
dard deviations away from the mean prediction p y (x). 
For the value of r to be feasible (i.e., for the cr-surface 
to be within the support of R y ), it must satisfy 

V (x, p, p) < l (x, p, t, v, c) < y(x, p, p) . (23) 

Equation (23) ensures that the support of the process 
contains outcomes that are up to r standard deviations 
from the mean prediction. 

The formulations that follow prescribe key statis- 

3 The upper bound in (16) results from applying the expected 
value operator E Pi [•] to both sides of pj < (p . + p t )Pi ~ P-Pi> 
which holds for all p, E [p , p.f and using = E p . [pf] — pf for 

i 1 , • • ■ , Up . 

4 When the correlation c is zero, the corresponding argument 
of any function depending on it will be dropped from the nota- 
tion. 



tics of p, thus of the random output y(x), based on 
input-output data. As such they encompass all RPMs 
that conform to these statistics. The first two of four 
types of RPMs are proposed here. Type-1 RPMs pre- 
scribe the mean and variance of R y (x ) when the en- 
tire data set is used. Type-2 RPMs prescribe the same 
statistics after eliminating the effects of a fixed per- 
centage of the observations. Such observations, which 
can be regarded as outliers, are worst-case in the sense 
that their removal tightens the predicted range of the 
a-volume the most. The formulations below only con- 
sider the uncorrelated case c = 0. Extensions to the 
correlated case can easily be made. Furthermore, the 
selection of p as p LS is arbitrary, and any other value 
can be used. In the developments that follow, the per- 
formance of an RPM refers to the property evaluated 
by the cost function in the corresponding OP. 

4.1 Type-1 RPMs 

Type-1 RPMs prescribe the mean and variance of 
R y (x ) when the entire data set in z is used. A Type-1 
RPM is given by Equations (6, 14), where p is a vector 
of uncorrelated random variables with expected value 
p = Pls, and a variance u — 0, given by the solution 
to the following program. 

Optimization Problem 2. The variance ofp is equal to 
v = argmin {E x [u y (x, u)} : l(x i} p, -a inax , v) < ip < 

v>0 

l(xi , p, a max , v) for i = 1, . . . , N} , (24) 

where cr max > 0 is a parameter prescribed by the ana- 
lyst, and ( Xi , yf) for i — 1, , . . , N are the observations 
in z. 

Hence, a Type 1-RPM minimizes the expected vari- 
ance of the random process R y (x) such that all ob- 
servations are no more than <r max standard deviations 
away from the mean prediction; i.e., all observations 
are within the cr-volume I a (x, p, <7 max , z>). 

The dependence of z> on cr max is studied next. Equa- 
tion (24), which is subject to 2N + n p inequality con- 
straints, is equivalent to the linear program 

0 = argmin {v T E x [p 2 (x)\ : al m u T p 2 {x i ) > 

V 

[yi - p T <p(xi)Y fori = 1,..., TV, (25) 

which is subject to N + n p constraints. The constraint 
set in (25) scales inversely with a 2 , so the scaled 
optimal objective value a^v T E x [tp 2 {x)\, is constant 
as cr max varies. It follows that that the larger <r max , 
the smaller ||z>||, and the larger the number of stan- 
dard deviations separating any given point (x, y) from 
the corresponding mean prediction. This observation 
has consequences for the I a resulting from this for- 
mulation. If z> i is the solution to (25) corresponding 


to cr max ,i, and z> 2 = adi where a = (cr max ,i/cr max , 2 ) 2 , 

then z> 2 is the solution to (25) corresponding to <r max , 2 - 
Consequently, the cr-volumes I c {x, p, cr maXj i, uf), and 
I a (x, p, o- maXj 2 , ^ 2 ) are equal. Hence, the cr-volume, 
I a (x, p, <r max , i>) is independent of the choice of a max . 

Note that both Type-1 IPMs and Type-1 RPMs re- 
quire solving a convex OP. As such they can effi- 
ciently handle hundreds of thousands of data points, 
thus, many more input dimensions than alternative 
metamodels. This is in sharp contrast to Gaussian Pro- 
cesses which are limited to a few thousand data points 
before becoming numerically intractable. 

A Type-1 RPM does not prescribe the support of 
p, thus, of R y (x). Any random vector satisfying the 
consistency Equations (15-18) for p = p LS and v = z> 
is a valid characterization of F p {p). Since Type-1 
RPMs are calculated by solving a convex optimiza- 
tion problem, they admit a rigorous reliability assess- 
ment. This assessment, presented in Section 5, quan- 
tifies the probability that a future observation will fall 
inside cr-volume I a (x,p LS , cw, *>)• 

Example 1: Consider the DGM y = x 2 cos(x) — 
sin(3a;)e -3: — cos(x 2 ) + x(g — 1), where x G M is 
an IID sequence of random variables with uni- 
form distribution over A" = [—5.5, 5.5], and g is 
IID with a standard normal distribution. Note that 
no knowledge on the structure of the DGM is re- 
quired to calculate the RPMs. A data sequence z 
for N = 150 observations was generated. In the de- 
velopments that follow we assume that n p =l . In 
(Crespo et al. 2014) we calculate several IPMs 
based on the same data sequence. The LS solu- 
tion is p LS = [-0.8734,-1.1059,-0.9926,0.0026, 
-0.0228, -0.0004, 0.0028] t . 

A Type-1 RPM for rr max = 1, to be referred to as 
RPM A, is shown in Figure 1. This figure shows 
the observations (x’s), the mean prediction p y {x) 
(solid line), as well as cr-surfaces (green dashed- 
dotted lines) in increments of 0.5 standard deviations. 
Note that the observation near (1, —15) limits the a- 
volume from below. The only significant variance in z> 
is z>! = 180.3824. The performance of RPM A, E x [u y ] 
is practically equal to u 1 . Note that 143 out of the 150 
observations are within the cr = 0.5 volume. Further 
notice that the number of standard deviations between 
an arbitrary point (x, y ) and the mean prediction for 
the same value of x, can be reduced by enlarging a. 
This can be attained by reducing a max . 

4.1.1 Identification of Outliers 
The presence of outliers in the data yields undesirably 
large cr-volumes and output ranges, diminishing the 
RPMs performance. Whereas the limits of the opti- 
mal I a might be prescribed by a few observations, the 
majority of them might be much closer to the mean 
prediction, e.g., RPM A. The outliers, whose removal 
from the data set will lead to smaller predicted vari- 
ances, can be identified using anyone of several fig- 





Figure 1: RPM A: Type-1 RPM for for <r max = 1. 


Figure 2: RPM B: Type-1 RPM after the removal of outliers. 


ures of merit. In this paper we will use the figure of 
merit 


C) 


(yi- H T p(xj )) 2 

v y (xi,v,c) 


(26) 


where u is the variance of p. n, is a variance- 
normalized distance squared between the ith observed 
output and the mean prediction at the corresponding 
input. Outliers will be identified by determining the 
data points corresponding to the largest percentiles of 
the empirical CDF of k, F^)(n), for i — 1, . . . , N, 
i.e., ( Xi,yi ) is an outlier if F K ^){^i) > A where 0 -C 
A < 1. Once the outliers are identified, they can be 
removed from the data sequence and a new Type-1 
RPM will be calculated. The resulting RPM will at- 
tain tighter predictions for a A fraction of the obser- 
vations in z, while the prediction for the remaining 
1 — A fraction might be considerably degraded. The 
outliers found by this procedure will be the same re- 
gardless of the value of cr max . This is a consequence of 
the following observation. If («j, F K (ni)) are points 
on the optimal CDFs corresponding to <r maXj i, the 
points on the optimal CDF corresponding to cr max2 are 
(aKi,F K (ni)), where a was defined earlier. 


Example 2: We now derive a Type-1 RPM for cr max = 
1 after removing seven outliers from the original data 
set. These outliers attain the largest values of n,. The 
resulting RPM, to be referred to as RPM B, is shown 
in Figure 2. In this case there are seven observations 
outside the a = 1 volume by design (shown with cir- 
cled cross symbols), 114 within the a = 0.5 volume, 
and the remaining 29 are inside the a = 1 volume and 
outside the a = 0.5 volume. The only sizable vari- 
ances for RPM B are f>i = 44.5139, and z> 2 = 0.5194. 
The performance of RPM B, E x [u y ] = 49.2469, is 
72.7% better than that of RPM A. 


4.2 Type-2 RPMs 


N observations available. The observations compris- 
ing the removed set are worst-case in the sense that 
their removal tightens the optimal a- volume the most. 
Whereas the outliers removed to construct RPM B are 
worst-case for the value of z> corresponding to RPM 
A only, those neglected in a Type-2 RPMs are worst- 
case for the varying value of v being considered dur- 
ing the optimization. This will be carried out without 
removing any point in the data sequence. 

In particular, a Type-2 RPM is given by Equations 
(6, 14), where p is a vector of uncorrelated variables 
with expected value p = p LS , and a variance v = D 
given by the following OP. 

Optimization Problem 3. The variance of p is equal to 
v = argmin {E x [u y (x, v)\ : F k[ij) (ol mx ) > A} , (27) 

where cr max > 0 is a parameter prescribed by the an- 
alyst, is the empirical CDF of n(v) in (26) 

based on the N observations in z, and 0 < A < 1, 
another parameter to be chosen by the analyst, is 
the proportion of observations to be contained by 

d (j (x, p , O’ maxi ^ l ('^))- 

Hence, a Type-2 RPM minimizes the expected vari- 
ance of the random process R y (x) such that 100 A% 
of the observations are no more than cr max standard 
deviations apart from the mean prediction. The tight- 
ening of the prediction for 100A% of the observations 
caused by (27) yields a cr-volume I a (x , p, cr max , z>(A)) 
that does not enclose the remaining 100(1 — A)%. 
This shows that (27) is a chance-constraint formula- 
tion (Chames et al. 1958), in which one is willing to 
accept the occurrence of unfavorable low-probability 
events (probability 1 — A) for the sake of an improved 
performance for high-probability events (probability 
A). As with Type-1 RPMs, a max is essentially a scal- 
ing factor. 

OP3 is a non-convex formulation, which for A = 
1 yields the same RPM as OP2 5 . When A < 1, 


A formulation leading to an alternative RPM is pre- 5 Note that, if E x [-\ is calculated based on a sample mean, the 

sented next. In contrast to Type-I RPMs, this approach entire z sample must be used to make the convex formulation 
searches for v by using only a fixed percentage of the equivalent to (27). 


a fixed number of observations (outliers) are ne- 
glected as the RPM is being calculated. Outliers 
can be easily identified by finding the data points 
for which F K ^ (rtfb)) > A. The points not satis- 
fying this condition, which are the elements of z 
within I c T (x, p, <r max , F(A)), constitute the sequence w. 
A Type- 1 RPM based on the data sequence w is equiv- 
alent to the Type-2 RPM in (27) based on the data se- 
quence z. This relationship enables performing a reli- 
ability assessment of Type-2 RPMs. This assessment, 
presented in Section 5, quantifies the probability that a 
future observation will be within I a (x, p, cr max , /7(A)). 

Example 3: We now derive a Type-2 RPM for the 
same observations used earlier having A = 143/150 
and c max = 1. As with RPM B, we search for a a- 
volume for which 143 observations are less than one 
standard deviation from the mean prediction. The re- 
sulting RPM, shown in Figure 3, will be referred to 
as RPM C. Note that the process is more concentrated 
about the LS prediction than either RPM A or RPM 
B. The only sizable components of D are z>i = 6.0124, 
and z> 2 = 3.2985. Note that the outliers, which are 
the observations outside I„{x, ft . 1, /7(A)), differ from 
those corresponding to RPM B. The performance of 
RPM C, E x [v v \ = 37.7676, is 23% better than that of 
RPM B. 

Figure 4 shows the empirical CDFs of rc(z>) for 
RPM A, B and C. The horizontal line A = 143/150 
is shown in green. Recall that the larger the expected 
value of k, the more concentrated F p (p), and the bet- 
ter the RPM. Using the figure of merit E[k\k < cr max ], 
which is the area between the constant function A and 
the CDF in the domain k G [0, cr max ], the ranking from 
best to worst is RPM C, RPM B, and RPM A. The 
advantage of RPM C is also reflected in the values of 
E x [u y ] listed above. The vertical jumps in the CDFs 
at k = c max are the result of obtaining an optimum for 
which several observations are on the boundary of the 
tW-volume. 

The evaluation of the same figure of merit above 
over the domain k G [cr max , oo] indicates that RPM B is 
the best model of the three. This can be inferred from 
Figure 4 by noting that the CDF of RPM B assumes 
the largest values of n for most of the probability val- 
ues exceeding A. This illustrates that (27) is a chance- 
constraint formulation, in which one is willing to ac- 
cept the occurrence of unfavorable low-probability 
events (i.e., those in the k G [a max ,oo] range) for the 
sake of an improved performance for high-probability 
events (i.e., those in the n G [0, a max ] range). 

5 MODEL’S RELIABILITY 

This section presents a framework for rigorously eval- 
uating the reliability of the predictor models proposed 
above. The reliability of model £, r(£), is the proba- 
bility that a future observation will be compliant with 
the requirements imposed upon the calculation of the 



Figure 3: RPM C: Type-2 RPM for A = 143/150 and cr max = 1. 



Figure 4: CDFs of k(u) for RPM A (red, dashed-dotted), B (blue, 
dashed) and C (black, solid). 

model. These requirements are cast in terms of an out- 
put y belonging to a a- volume J CT (x) for both Type-1 
and Type-2 RPMs. The developments that follow are 
based on the Scenario Approach (Calafiore & Campi 
2006). 

Denote by P the unknown distribution of the DGM 
from which the points of the data sequence z are ob- 
tained. P can be interpreted as a probabilistic cloud 
in the X x Y -space. The case in which y is a deter- 
ministic function of x only is a particular case where 
P is concentrated over the function. A general P can 
accommodate situations where the fluctuation in the 
outcome y is caused by sources other than x. No as- 
sumption is made on P so that the functional form 
relating x and y can be arbitrary. The following theo- 
rem, taken from (Campi, Calafiore, & Garatti 2009), 
permits quantifying the reliability of an empirical pre- 
dictor model whenever the OP used for its calculation 
is convex. 

Theorem 1. Let z = {^} = {{x^yf)}, for i = 
1, ... ,N, be an independent data sequence resulting 
from a stationary discrete-time data generating pro- 
cess. Suppose the model £ is calculated by solving 
a convex constrained optimization problem having a 
unique solution. Furthermore, assume that k obser- 
vations ( outliers ) out of the N available have been 
discarded when calculating the model. Then, for any 
e G (0, 1) and assuming k < N — d, where d is the 


number of optimization variables used to calculate £, 
it holds that 

Probpjv [r{8) > 1 — e] > 1 — /3, where (28) 

„ {N — d)\ e 

' {N — d)\d\ ^(N -d-i)W.(l-eY' ’ 

This theorem provides an assessment of unob- 
served data. The theorem states that the reliability of 
£ is no worse than 1 — e with probability greater than 
1 — (3. As for the probability 1 — f3, one should note 
that £ is a random model by virtue of the random- 
ness in P prescribing z. Therefore, its reliability can 
be greater than or equal to 1 — e for some random ob- 
servations but not for others, and f3 refers to the prob- 
ability P A = P x • • • x P of observing a bad set of N 
samples such that the reliability of the model is less 
than 1 — e. Parameter e is referred to as the reliabil- 
ity parameter while (3 is the confidence parameter. It 
is worth noting that the confidence parameter can be 
made small enough that it losses any practical signif- 
icance and r(£ ) > 1 — e. This can be done without 
letting N be too large because /3 vanishes exponen- 
tially with N. 

The reliability of a Type-1 IPM, to be denoted as 
X, is defined as r(Z) = Probp [(x,y) G I y (x,p,p)]. 
Hence, r(X) is the probability that an unobserved 
input-output pair (x,y) will fall within the range 
I y (x). The convexity of the OP1 enables the direct 
application of Theorem 1. The reliability of any Type- 

1 or Type-2 RPM, to be denoted as 7Z, is defined as 
r(JZ) = Probp [(x,y) G I a (x, p, cr max , z>(A))]. Hence, 
r('R) is the probability that an unobserved input- 
output pair (x. y ) will fall in the optimal a- volume 
corresponding to cr max . 

The convexity of OP2 enables the direct applica- 
tion of Theorem 1 to Type-1 RPMs. This includes 
the cases in which none ( k = 0) and some ( k > 0) 
of the observations are removed from the data set in 
advance. In contrast to OP2, OP3 is non-convex. This 
opens the possibility of (27) having multiple optima. 
Multiple optima may result from the possibility of ob- 
taining the same RPM for different sets of outliers. 
Because Type-2 RPMs are calculated by solving a 
non-convex program, Theorem 1 cannot be applied 
directly. However, the reliability of such models can 
be establish by using the Principle of Equivalence. 
This principle is based on identifying an auxiliary 
convex formulation that will result in the very same 
empirical model found by solving the non-convex 
formulation. If this is attained, the reliability of the 
model, which is independent of the means used to cal- 
culate it, can be rigorously evaluated via the auxiliary 
formulation. This approach can be applied to Type- 

2 RPMs. In particular, the solution to OP3 using the 
original the data sequence z for a given value of A is 
equivalent to the solution of OP2, which is a convex 
program, with the data sequence w. Because only the 


N — k* elements in w, where 

k* = floor[7V(l- A)], (30) 

are required by the auxiliary program, the reliability 
of Type-2 RPMs is given by (28) with k = k* in 
(29). These k* observations fall outside the optimal 
a - volume and satisfy F k ^(k) > A. 

Example 6: The reliability of RPM A, B and C, 
for which (28) is directly applicable, is considered 
first. The reliability of RPM A, which is a Type-1 
RPMs calculated using N = 150, k = 0 and d = 7, 
is no less than 1 — e = 0.8050 with confidence 
1 — ft = 0.99; while the reliability of RPM B, which 
is also a Type-1 RPM for which N = 150, k = 7 and 
d = 7, is no less than 1 — e = 0.6984 with confidence 
1 — fj — 0.99. Hence, the exclusion of seven outliers 
rendered an improvement in the system performance 
of 72.7% at the expense of a reduction in the model’s 
reliability of 10.66%. The reliability of RPM C, 
which is a Type-2 RPM for which k = k* = 7, and 
the reliability of RPM B are the same, even though 
the performance of RPM C is 23% better. 

6 CONCLUSIONS 

This and the companion paper (Crespo et al. 2015) 
develop techniques for constructing random predictor 
models based on data. The formulations proposed en- 
able a rigorous characterization of key features of the 
predicted output, and of the reliability of the predic- 
tion. These articles set forth a new paradigm for the 
construction of empirical models in which the mod- 
els performance and reliability can be evaluated and 
traded off using rigorous means. 
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